Categorizing Unknown Text Patterns for Information Extraction Using a Search Result Mining Approach

نویسندگان

  • Chien-Chung Huang
  • Shui-Lung Chuang
  • Lee-Feng Chien
چکیده

An advanced information extraction system requires an effective text categorization technique to categorize extracted facts (text patterns) into a hierarchy of domain-specific topic categories. Text patterns are often short and their categorization is quite different from conventional document categorization. This paper proposes a Web mining approach that exploits Web resources to categorize unknown text patterns with limited manual intervention. The feasibility and wide adaptability of the proposed approach has been shown with extensive experiments on categorizing different kinds of text patterns including domain-specific terms, named entities, and even paper titles into Yahoo!’s taxonomy trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Opinion Mining in Hungarian based on textual and graphical clues

Opinion Mining aims at recognizing and categorizing or extracting opinions found in unstructured text resources and is one of the most dynamically evolving subdiscipline of Computational Linguistics showing some resemblance to document classification and information extraction tasks. In this paper we propose a novel approach in Opinion Mining which combines Machine Learning models based on trad...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Ontology-Based Interactive Information Extraction

Interactive Information Extraction brings together search and information extraction to provide fast, interactive text mining over large volumes of text such as Medline abstracts, full text scientific articles, patents etc. As well as covering the two ends of the spectrum: keyword search over documents, and detailed linguistic patterns within sentences, the Interactive Information Extraction Sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004